AITopics | explore-then-commit strategy

On Explore-Then-Commit strategies

Neural Information Processing SystemsNov-21-2025, 15:34:04 GMT

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main message, we also refine existing deviation inequalities, which allow us to design fully sequential strategies with finite-time regret guarantees that are (a) asymptotically optimal as the horizon grows and (b) order-optimal in the minimax sense. Furthermore we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case.

explore-then-commit strategy, name change, proceedings, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback

On Explore-Then-Commit Strategies

Neural Information Processing SystemsNov-21-2025, 10:42:01 GMT

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
(3 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.51)

Add feedback

Reviews: On Explore-Then-Commit strategies

Neural Information Processing SystemsJan-20-2025, 22:13:38 GMT

One interesting part of the paper to me is the strategy in Algorithm 5 that is asymptotically optimal and optimal in the minimax sense. Although the strategy is not very original (a small modification of a strategy suggested by Lai), it is a new observation for a strategy to be asymptotically optimal and minimax optimal. However, it should be noted that it is optimal only in the case of two arms, which is a major drawback. The authors are encouraged to improve on this result. This idea is new to the best of my knowledge.

explore-then-commit strategy, inequality, straight forward, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.62)

Add feedback

On Explore-Then-Commit Strategies

Neural Information Processing SystemsMar-12-2024, 19:14:38 GMT

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main message, we also refine existing deviation inequalities, which allow us to design fully sequential strategies with finite-time regret guarantees that are (a) asymptotically optimal as the horizon grows and (b) order-optimal in the minimax sense. Furthermore we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case.

algorithm, bandit problem, fixed-design strategy, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
(3 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.51)

Add feedback

On Explore-Then-Commit strategies

Garivier, Aurelien, Lattimore, Tor, Kaufmann, Emilie

Neural Information Processing SystemsFeb-14-2020, 06:44:00 GMT

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main message, we also refine existing deviation inequalities, which allow us to design fully sequential strategies with finite-time regret guarantees that are (a) asymptotically optimal as the horizon grows and (b) order-optimal in the minimax sense. Furthermore we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case.

explore-then-commit strategy

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.54)

Add feedback

Linear Bandits with Feature Feedback

Oswal, Urvashi, Bhargava, Aniruddha, Nowak, Robert

arXiv.org Machine LearningMar-11-2019

This paper explores a new form of the linear bandit problem in which the algorithm receives the usual stochastic rewards as well as stochastic feedback about which features are relevant to the rewards, the latter feedback being the novel aspect. The focus of this paper is the development of new theory and algorithms for linear bandits with feature feedback. We show that linear bandits with feature feedback can achieve regret over time horizon $T$ that scales like $k\sqrt{T}$, without prior knowledge of which features are relevant nor the number $k$ of relevant features. In comparison, the regret of traditional linear bandits is $d\sqrt{T}$, where $d$ is the total number of (relevant and irrelevant) features, so the improvement can be dramatic if $k\ll d$. The computational complexity of the new algorithm is proportional to $k$ rather than $d$, making it much more suitable for real-world applications compared to traditional linear bandits. We demonstrate the performance of the new algorithm with synthetic and real human-labeled data.

data mining, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

1903.03705

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.35)

Add feedback

On Explore-Then-Commit strategies

Garivier, Aurelien, Lattimore, Tor, Kaufmann, Emilie

Neural Information Processing SystemsDec-31-2016

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main message, we also refine existing deviation inequalities, which allow us to design fully sequential strategies with finite-time regret guarantees that are (a) asymptotically optimal as the horizon grows and (b) order-optimal in the minimax sense. Furthermore we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: